Search results for "Minimum description length"
showing 4 items of 4 documents
State transition identification in multivariate time series (STIMTS) applied to rotational jump trajectories from single molecules
2018
Time resolved data from single molecule experiments often suffer from contamination with noise due to a low signal level. Identifying a proper model to describe the data thus requires an approach with sufficient model parameters without misinterpreting the noise as relevant data. Here, we report on a generalized data evaluation process to extract states with piecewise constant signal level from simultaneously recorded multivariate data, typical for multichannel single molecule experiments. The method employs the minimum description length principle to avoid overfitting the data by using an objective function, which is based on a tradeoff between fitting accuracy and model complexity. We val…
Discovering unbounded unions of regular pattern languages from positive examples
1996
The problem of learning unions of certain pattern languages from positive examples is considered. We restrict to the regular patterns, i.e., patterns where each variable symbol can appear only once, and to the substring patterns, which is a subclass of regular patterns of the type xαy, where x and y are variables and α is a string of constant symbols. We present an algorithm that, given a set of strings, finds a good collection of patterns covering this set. The notion of a ‘good covering’ is defined as the most probable collection of patterns likely to be present in the examples, assuming a simple probabilistic model, or equivalently using the Minimum Description Length (MDL) principle. Ou…
Textual data compression in computational biology: Algorithmic techniques
2012
Abstract In a recent review [R. Giancarlo, D. Scaturro, F. Utro, Textual data compression in computational biology: a synopsis, Bioinformatics 25 (2009) 1575–1586] the first systematic organization and presentation of the impact of textual data compression for the analysis of biological data has been given. Its main focus was on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been used together with a technical presentation of how well-known notions from information theory have been adapted to successfully work on biological data. Rather surprisingly, the use of data compression is pervasive in computational biology. Starting from…
Complexity Selection of the Self-Organizing Map
2002
This paper describes how the complexity of the Self-Organizing Map can be selected using the Minimum Message Length principle. The use of the method in textual data analysis is also demonstrated.